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Abstract 

We formulate an optimal transport problem for matrix- valued density functions. This is pertinent in the spectral 
analysis of multivariable time-series. The "mass" represents energy at various frequencies whereas, in addition to a 
usual transportation cost across frequencies, a cost of rotation is also taken into account. We show that it is natural 
to seek the transportation plan in the tensor product of the spaces for the two matrix-valued marginals. In contrast 
to the classical Monge-Kantorovich setting, the transportation plan is no longer supported on a thin zero-measure 

U '■ SeL 

I. Introduction 

^__i ■ The formulation of optimal mass transport (OMT) goes back to the work of G. Monge in 1781 |T|. The modern 
formulation is due to Kantorovich in 1947 [2]. In recent years the subject is evolving rather rapidly due to the 

K, ' wide range of applications in economics, theoretical physics, probability, etc. Important recent monographs on the 

^ ' subject include 0, H, 0. 

Our interest in the subject of matrix-valued transport originates in the spectral analysis of multi-variable time- 

O ' series. It is natural to consider the weak topology for power spectra. This is because statistics typically represent 

integrals of power spectra and hence a suitable form of continuity is desirable. Optimal mass transport and the 

t-h ! geometry of the Wasserstein metric provide a natural framework for studying scalar densities. Thus, the scalar OMT 
theory was used in [6] for modeling slowly time-varying changes in the power spectra of time-series. The salient 

£/■} • feature of matrix-valued densities is that power can shift across frequencies as well as across different channels via 

0> rotation of the corresponding eigenvectors. Thus, transport between matrix-valued densities requires that we take 

^l ' into account the cost of rotation as well as the cost of shifting power across frequencies. 

Besides the formulation of a "non-commutative" Monge-Kantorovich transportation problem, the main results in 
the paper are that (1) the solution to our problem can be cast as a convex-optimization problem, (2) geodesies can 
be determined by convex programming, and (3) that the optimal transport plan has support which, in contrast to 
the classical Monge-Kantorovich setting, is no longer contained on a thin zero-measure set. 



II. Preliminaries on Optimal Mass Transport 

Consider two probability density functions fiQ and ix\ supported on R. Let M(/io,/Ui) be the set of probability 
measures m(x, y) on R x R with ^o and [i\ as marginal density functions, i.e. 



m(x,y)dy = fi (x), / m(x,y)dx = fJ,i(y), m(x,y)>0. 

JM. 

The set M(fiQ,fii) is not empty since m(x,y) = /Uo(x)^i(y) is always a feasible solution. Probability densities 
can be thought of as distributions of mass and a cost c(x, y) associated with transferring one unit of mass from 
one location x to y. For c(x,y) = \x — y\ 2 the optimal transport cost gives rise to the 2-Wasserstein metric 

W2O0, Hi) = 7iGuo,/ii)2 
where 

7^0o,/ii) : = m f / c(x,y)m(x,y)dxdy. (1) 
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Problem (|T|) is a linear programming problem with dual 



sup<^ / (f)ofi -4>xH\dx | (f) (x) -0i (y) < c(x,y) \ (2) 

<j},ip Um J 

see e.g., O. Moreover, for the quadratic cost function c(x,y) = |x — y\ 2 , 72(/Uo>Mi) can a l so be written explicitly 
in term of the cumulative distributions functions 

/x 
Hidx for i = 0, 1, 
-oo 

as follows (see [3, page 75]) 



75(^0, Mi) = / liV^-Ff 1 ^)! 2 ^, 
jo 



(3) 
'o 

and the optimal joint probability density m E M(po, p\) has support on (x, T(x)) where T{x) is the sub-differential 
of a convex lower semi-continuous function. More specifically, T(x) is uniquely defined by 

F Q (x) = Fi(T(x)). (4) 

Finally, a geodesic /i T (r G [0, 1]) between //o an d ^i can be written explicitly in terms of the cumulative function 
F T defined by 

F t ((1-t)x + tT(x)) = F q (x). (5) 

Then, clearly, 

W 2 (no, n T ) = rW 2 (0o,Mi) 
W 2 (0 T ,0i) = (1 -r)W2(/Uo,Mi)- 

III. Matrix- valued Optimal Mass Transport 
We consider the family 

T := \ p [ for x G R, p(x) G C nxn Hermitian, p(x) > 0, tr( / p(x)dx) = 1 

of Hermitian positive semi-definite, matrix-valued densities on R, normalized so that their trace integrates to 1. We 
motivate a transportation cost to this matrix-valued setting and introduce a generalization of the Monge-Kantorovich 
OMT to matrix-valued densities. 

A. Tensor product and partial trace 

Consider two n-dimensional Hilbert spaces T-Lq and Tii with basis {u±, . . . ,u n } and {v\, . . . ,v n }, respectively. 
Let C(Ho) and £(%i) denote the space of linear operators on T-Lq and Hi, respectively. For p S £(Ho) and 
Pi G C(T~Li), we denote their tensor product by p <g> p x G £(7^0 ® "Hi). Formally, the latter is defined via 

Po ® Pi : «®«h> p u (8) Pif . 

Since our spaces are finite-dimensional this is precisely the Kronecker product of the corresponding matrix repre- 
sentation of the two operators. 

Consider p G £(%o ® %i) which can be thought of as a matrix of size n 2 x n 2 . The partial traces tr% and 
tr^, or tro and tri for brevity, are linear maps 

PG£(H ®^i) ^ tri(p) G £(H Q ) 
m- tr (p) G C(Hi) 

that are defined as follows. Partition p into nxn block-entries and denote by p ki the (k, £)-th block (1 < k, £ < n). 
Then the partial trace, e.g., 

p :=tri(p) 



is the n x n matrix with 



The partial trace 



\Po\ki = to(Pke), for 1 < k,£ < n. 



p x := tr (p) 



is defined in a similar manner for a corresponding partition of p, see e.g., Q. More specifically, for 1 < i, j < n, 
let p JJ be a sub-matrix of p of size n xn with the (k, £)-\h entry [p l ^]ki = \pht\ij- Then the (i, j)-th entry of p 1 is 



Thus 



[Pi]« = tr(p ij ). 
tri(p 8) pi) = tr(p!)p and tr (p ® Pi) = tr (p )p 1 . 



5. 7o/«f density for matrix-valued distributions 

A naive attempt to define a joint probability density given marginals Po'Mi *= ^*n is to consider a matrix-valued 
density with support onKxl such that m > and 



/ m(x,y)dy = p (x), / m(x,y)dx = Pi(y). 

JR 7R 



(6) 



However, in contrast to the scalar case, this constraint is not always feasible. To see this consider 
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5(x - x 2 ). 



It is easy to show that © cannot be met. 

A natural definition for joint densities m that can serve as a transportation plan may be defined as follows. For 

(x,y) GlxR, 



and with 



one has 



Thus, we denote by 



m(x, y) is n x n positive semi-definite matrix, 



m (x,y) := ti 1 (m(x,y)),m 1 (x,y) := tr (m(x,y)), 



/ -rno(x,y)dy = v (x), / mi(x,y)dx = p ± (y). 
M(p , p 1 ) := < m | dTal ) — dTcj ) are satisfied >. 



(7a) 



(7b) 



(7c) 



For this family, given marginals, there is always an admissible joint distribution as stated in the following proposition. 

Proposition 1: For any Pq'/^i ^ -F n , the set iW(p ,p 1 ) is not empty. 

Proof: Clearly, m := p <8> p x G iW(p , p x ). ■ 

We next motivate a natural form for the transportation cost. This is a functional on the joint density as in the 
scalar case. However, besides a penalty on "linear" transport we now take into account an "angular" penalty as 
well. 



C. Transportation cost 

We interpret tr(m(x,y)) as the amount of "mass" that is being transferred from x to y. Thus, for a scalar cost 
function c(x, y) as before, one may simply consider 

min / c(x,y) tr (m(x,y))dxdy. (8) 

mGM(/x ,/i 1 ) y KxR 

However, if ti(fi (x)) = tr(/i 1 (a;)) Vx € ffi, then the optimal value of ([8]) is zero. Thus ([8]) fails to quantify 
mismatch in the matricial setting. 

For simplicity, throughout, we only consider marginals /x, which pointwise satisfy tr(/x) > 0. tr(/Lt(x)) is a 
scalar-valued density representing mass at location x while tr ^ \V has trace 1 and contains directional information. 
Likewise, for a joint density m(x, y), assuming m(x, y) ^ 0, we consider 

tr (m(x,y)) := tr (m(x,y))/tr(m(x,y)) 
tri(m(x,y)) := tri(m(x,y))/tr(m(x,y)). 

Since tr (m(x,y)) and tr 1 (m(x,y)) are normalized to have unit trace, their difference captures the directional 
mismatch between the two partial traces. Thus take 

tr(||(tr -iEi)m(x,y)|||m(x,y)) 

to quantify the rotational mismatch. The above motivates the following cost functional that includes both terms, 
rotational and linear: 

tr ((c(x,y) + X\\(tT J0 -tT 1 )m(x,y)\\l)m(x,y) 

where A > can be used to weigh in the relative significance of the two terms. 

D. Optimal transportation problem 

In view of the above, we now arrive at the following formulation of a matrix-valued version of the OMT, namely 
the determination of 

T 2 .AOo,/- t i) : = min / tr ( (c + A||(tr — tri)m.|||)m. \dxdy. (9) 

meM(M ,Mi) JKxR V / 

Interestingly, © can be cast as a convex optimization problem. We explain this next. 
Since, by definition, 

tr (m) tr(m) = tro(m), 
tr 1 (m)t i c(m) = tri(m), 

we deduce that 

||(tr -tr 1 )m|||tr(m)=I^- kl)m|l ^ tr(m)2 



||(tr -tri)m 112 



tr(m) 

F 



tr(m) 

Now let m(x,y) = tr(m(x,y)) and let mo(x,y) and mi(x,y) be as in CO). The expression for the optimal cost 
in © can be equivalently written as 

U/ II l|2 \ 
I c(x,y)m(x,y) + A ) dxdy \ mo(x,y), m\(x,y) > 
\ ml 



tr(m (x,y)) = tr(mi(x,y)) = m(x,y) 
m (x,y)dy = fi (x) 

/,»,(,,,)*-*(,)}. (10) 



Since, for x > 0, 

(y-z) 2 

X 

is convex in the arguments x,y,z, it readily follows that the integral in ([Tol l is a convex functional. All constraints 
in (fTOb are also convex and therefore, so is the optimization problem. 

IV. On the geometry of Optimal Mass Transport 

A standard result in the (scalar) OMT theory is that the transportation plan is the sub-differential of a convex 
function. As a consequence the transportation plan has support only on a monotonically non-decreasing zero- 
measure set. This is no longer true for the optimal transportation plan for matrix- valued density functions and this 
we discuss next. 

In optimal transport theory for scalar-valued distributions, the optimal transportation plan has a certain cyclically 
monotonic property [3|. More specifically, if (xi,yi), (#2,2/2) are two points where the transportation plan has 
support, then x 2 > x\ implies g/2 > 2/1- The interpretation is that optimal transportation paths do not cross. For the 
case of matrix-valued distributions as in ©, this property may not hold in the same way. However, interestingly, a 
weaker monotonicity property holds for the supporting set of the optimal matrix transportation plan. The property 
is defined next and the precise statement is given in Proposition [3] below. 

Definition 2: A set S C M 2 is called a X-monotonically non-decreasing, for A > 0, if for any two points 
(xi,yi), (2:2,2/2) G S, it holds that 

(x 2 -xi)(yx -y 2 ) < A. 

A geometric interpretation for a A-monotonically non-decreasing set is that if (x\,y\), (2:2,2/2) e & anc ' x 2 > x\, 
2/1 > 2/2, then the area of the rectangle with vertices (xi,yj) (i,j e {1,2}) is not larger than A. The transportation 
plan of the scalar-valued optimal transportation problem with a quadratic cost has support on a 0-monotonically 
non-decreasing set. 

Proposition 3: Given /i , /i 1 € T, let m be the optimal transportation plan in © with A > 0. Then m has 
support on at most a (4 • A) -monotonically non-decreasing set. 

Proof: See the appendix. ■ 

Then the optimal transportation cost ^(ftiMi) satisfies the following properties: 

1) 75,a(Mo>Mi)=72,a(Mi,Mo)> 

2) 7^,a(a*o> A*i) ^ °> 

3) T 2 ,\(fJ, , Mi) = if and only if fi = fi ± . 

1 
Thus, although T 2 ^\{^q, fJ-i) can be used to compare matrix-valued densities, it is not a metric and neither is T 2 2 \ 

since the triangular inequality does not hold in general. We will introduce a slightly different formulation of a 

transportation problem which does give rise to a metric. 

A. Optimal transport on a subset 

In this subsection, we restrict attention to a certain subset of transport plans iVf(/i ,/x 1 ) and show that the 
corresponding optimal transportation cost induces a metric. More specifically, let 



M o(fJ-o,^i) ■= S rn I m(i 1 y)=/i (x)®/x 1 (?/)a(x,2/), m£M 

For m(x, y) G M (n , fx x ), 

tr (m(x,y)) :=/x 1 (x)/tr(/i 1 (x)) 
iLi(m(x,y)) := n (y) / 'tr(^ (y)) . 

Given /x and /x 1? the "orientation" of the mass of m(x,y) is fixed. Thus, in this case, the optimal transportation 
cost is 

T 2 ,a(Mo> A*i) : = min / tr ( (c + A||(tr -tr 1 )m(a;,2/)||F)m \dxdy. (11) 

meM (M ,Mi)7 V / 



Proposition 4: For 7~2,a as in (fTTb and £t >Mi G ^> 

^2,a(M . Ml) : = (T2,a(M0' Ml)) 2 (12) 

defines a metric on J 7 . 

Proof: It is straightforward to prove that 

<h,\(po,Hi) = d 2 ,\(Hl, Ho) > ° 
and that d,2,\(fJ-o, Hi) = if and only if /x = /*i« We will show that the triangle inequality also holds. For 

Mo>Ml>M2 e -^n* let 

, . Mn(zO Mi (y) / % 

/ n Mi(y) ~ M2W , n 

raM& ''* ) -S5Se*S5Sw) mM( » , * ) 

be the optimal transportation plan for the pairs (/*o>A*i) an( l (Mi>A*2)> respectively, where moi and m 12 are 
two (scalar-valued) joint densities on 1R 2 with marginals tr(/u ), tr(/i 1 ) and tr(/x 1 ), tr(/x 2 ), respectively. Given 
moi(a;,y) and mi2(y,z) there is a joint density function m(x,y,z) on M 3 with moi and mi2 as the marginals on 
the corresponding subspaces [3 page 208]. We denote 

"*■ »■ *> - t^J J» ® TO m(l ' 9 ' 2) 

then it has moi and mi2 as the matrix-valued marginal distributions. 

Now, let rriQ2(x, z) = ^ / \ ® t ^ 2 ( z \ m 02(x, z) be the marginal distribution of m(x, y, z) when tracing out the 
y-component. Then m,Q2(x, z) is a candidate transportation plan between /x and /i 2 . Thus 



«M*» M 2 ) < (/ r2 ((* - *)' + A|| J^ - ^L III) m Q2d x dz ) ' 



(a - z) 2 + All^L _ -M?L||| ) mcfcefoefe 

tr/x (x) tr/^ 2 (z) y 

:x-, +y -,) 2 + A|i^-^ + ^--^iii)^^ 

tr/i (x) tr/* x (y) tr/i^y) tr/^z 1 
< / ((Jc-y) +A rT - F moidxdy + 



( ( x2 , xii Ml(y) M 2 ( z ) |,2\ jjV 

^2,a(M0>Mi) +^2,A(Ml,M2) 



where the last inequality is from the fact that L 2 -norm defines a metric. ■ 

Proposition 5: Given /i >Mi £ ^ l et m De tne optimal transportation plan in ([T2l . then m has support on at 
most a (2 • A)-monotonically non-decreasing set. 

Proof: We need to prove that if m(xi,yi) ^ and m(a; 2 ,y 2 ) ^ 0, then X2 > Xi, yi > 2/2 implies 

(yi-y 2 )(x 2 -xi)<2A. (13) 

Assume that m evaluated at the four points (x{,yj), with i,j G {1,2}, is as follows 

m(xi,yj) = rriij ■ A4 <g> £.,■ 

with 

= Mo(^) B _ Mo(yi) 



tr(/*i(aJi))' * ' tx(Mi(?/i))' 



and mn, m22 > 0. The steps of the proof are similar to those of Proposition [3] first, we assume that Proposition 
[5] fails and that 

(yi -yi){x2 -xx) > 2A. 

Then we show that a smaller cost can be obtained by rearranging the "mass". Consider the situation when 77122 > mn 
first and let m be a new transportation plan with 

rh(xi,yx) = 

rh(x 1 ,y 2 ) = (mil + mil) ■ A ± B 2 
rh(x2,yi) = (mn + m,2i) ■ A 2 ® B\ 
rn(x 2 ,y 2 ) = (m 22 - mn) ■ A 2 ® B 2 . 

Then, m has the same marginals as m at the four points and the cost incurred by m is 

2 2 

J2 J2 m ij (( x i - yj) 2 + A ll^ - B M) ( 14 ) 



i=i j=i 



while the cost incurred by rh is 



(mn + mi 2 ) ((xi - y 2 f + \\\Ai - B 2 \\l) 
+ (mn + m 2 i) ((x 2 - yif + \\\A 2 - Bi\\l) 
+{m 22 - mn) {{x 2 - y 2 f + X\\A 2 - B 2 \\l) . (15) 

After canceling the common terms, to show that <TT4T > is larger than (fT5T ). it suffices to show that 



The above holds since 
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The case mn > 77122 proceeds similarly. 



V. Example 



We highlight the relevance of the matrix-valued OMT to spectral analysis by presenting an numerical example 
of spectral morphing. The idea is to model slowly time-varying changes in the spectral domain by geodesies in 
a suitable geometry (see e.g., J6], El). The importance of OMT stems from the fact that it induces a weakly 
continuous metric. Thereby, geodesies smoothly shift spectral power across frequencies lessening the possibility 
of a fade-in fade-out phenomenon. The classical theory of OMT allows constructing such geodesies for scalar- 
valued distributions. The example below demonstrates that we can now have analogous construction of geodesies 
of matrix-valued power spectra as well. 

Starting with /IqjMi G^we approximate the geodesic between them by identifying N — 1 points between the 
two. More specifically, we set /j, To = /x and /x Tjv = fj, t , and determine fi Tk £ T n for k = 1, . . . , N — 1 by solving 

JV-l 

^^E^/W^J. (16) 



As noted in Section IIII-Dl numerically this can be solved via a convex programming problem. The numerical 
example is based on the following two matrix-valued power spectral densities 
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shown in Figure \T\ The value of a power spectral density at each point in frequency is a 2 x 2 Hermitian matrix. 
Hence, the (1,1), (1,2), and (2,2) subplots display the magnitude of the corresponding entries, i.e., |/x(l,l)|, 
|/x(l,2)| (= |/x(2, 1)|) and |/x(2,2)|, respectively. The (2,1) subplot displays the phase Z/x(l,2) (= -Z/i(2, 1)). 

The three dimensional plots in Figure [2] show the solution of (fT6l ) with A = 0.1 which is an approximation of a 
geodesic. The two boundary plots represent the power spectra /j, Q and fi 1 shown in blue and red, respectively, using 
the same convention about magnitudes and phases. There are in total 7 power spectra /i , k = 1, . . . , 7 shown 
along the geodesic between fjt and /x x , and the time indices corresponds to r^ = |. It is interesting to observe the 
smooth shift of the energy from one "channel" to the other one over the geodesic path while the peak shifts from 
one frequency to another. 
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Fig. 1. Subplots (1,1), (1,2) and (2,2) show ^(1, 1), 1/^(1,2)1 (same as |^(2, 1)|) and £1,(2,2). Subplot (2,1) shows Z(/i i (2, 1)) for 
i G {0, 1} in blue and red, respectively. 



VI. Conclusions 

This paper considers the optimal mass transportation problem of matrix-valued densities. This is motivated by 
the need for a suitable topology for the spectral analysis of multivariable time-series. It is well known that the OMT 
between scalar densities induces a Riemannian metric Q, ifTOll (see also iTTTTl a systems viewpoint and connections 
to image analysis and metrics on power spectra). Our interest has been in extending such a Riemannian structure to 
matrix-valued densities. Thus, we formulate a "non-commutative" version of the Monge-Kantorovich transportation 
problem which can be cast as a convex-optimization problem. Interestingly, in contrast to the scalar case, the optimal 
transport plan is no longer supported on a set of measure zero . Versions of non-commutative Monge-Kantorovich 
transportation has been studied in the context of free -probability |[T2l . The relation of that to our formulation is still 



/*T»,C1,1)( 9 ) 



K,(l,2)(»)l 




Fig. 2. The interpolated results /x T for fc = 0, . . . , 8 computed from d!6t with fi and fi 1 as the two boundary points: subplots (1,1), 
(1,2) and (2,2) show ^ (1, 1), \fJ, T Jl,2)\ (same as |/i rfc (2, 1)|) and Mrfc (2, 2), subplot (2,1) shows Z(/i Tfc (2, 1)). 



unclear. Finally, we note that if the matrix-valued distributions commute, then it is easy to check that our set-up 
reduces to that of a number of scalar problems, which is also the case in lfl2l . 



VII. Appendix: proof of Proposition [3] 
We need to prove that if m{x\,yi) ^ and m{x2,y2) / 0, then x 2 > x\, y\ > y 2 implies 

(x 2 -xi)(yi -y 2 ) < 4A. 

Without loss of generality, let 

m(xi, y 3 ) = rriij ■ A tj <g> B i3 - 



(17) 



(18) 



with Aij,B{j > 0, ti(Aij) = ti(Bij) = 1 and i,j € {1,2}. Note that m\i and mi\ could be zero if m does not 
have support on the particular point. We assume that the condition in the proposition fails and 



(x 2 -xi)(yi -y 2 ) > 4A, 



(19) 



then we show that by rearranging mass the cost can be reduced. 

We first consider the situation when m 2 2 > mu, By rearranging the value of m at the four points (xi,yj) with 
i, j € {1, 2}, we construct a new transportation plan rh at these four locations as follows 



where 



A12 

A 2 i 



m(xi,yi) = 

fh(x 1 ,y 2 ) = (mu + mia) • A 12 <8» B 12 

rh(x 2 ,y 1 ) = (mu + m 21 ) ■ A 21 <g> B 21 

, m(x 2 ,y 2 ) = (m 22 - m u ) ■ A 22 <g> B 22 



mu An +mi 2 Ai 2 ~ m n B 22 + m 12 Bi 2 
i ' Bl2 = i 

mil +77112 mil + 77112 

vfi\\A 22 + m 2 iA 2 i ~ m u B u + m 2 i-B 2 i 

,-D21 = 



(20a) 
(20b) 
(20c) 
(20d) 



mu + 77i 2 i 



mu + 7ti 2 i 



This new transportation plan rh has the same marginals as m at x\,x 2 and yi,y 2 . The original cost incurred by 
m at these four locations is 

2 2 

/,/. m i3 {( X i ~ Vi) 2 + M\ A ij - BijWp) (21) 

i=l j=l 



10 

while the cost incurred by m is 

(mu + "212) f (xi - y 2 ) 2 + A||ii2 - -Bi 2 ||f 
+(mn + m 2 i) ((#2 - yi) 2 + X\\A 2 i - B 2 \ \ 
+(m 22 - mu) ((x 2 - 2/2) 2 + A||A 22 - B 22 ||f) . (22) 

After simplification, to show that (12T1 ) is larger than (|22J . it suffices to show that 

2m n (x 2 - x 1 )(y 1 - y 2 ) (23) 

is larger than 

Amu j] ^ ||Iy - S 4i ||| - ^ Pii - S«||| (24a) 

+ \m 12 (\\A l2 - B l2 \\l - \\A 12 - B12IH) (24b) 

+ Am 2 i (||i 2 i - B21IH " P21 - B21IH) • (24c) 

From the assumption in (fT9l ), the value of (l23l > 20Amn. We derive upper bounds for each term in (l24l . First, 

(Si < Amu M|Ii2 - S12III + ||i 2 i - S21IHJ < 4Amu 
where the last inequality follows from the fact that for A,B>0 and tr(A) = tr(B) = 1, 

||A - B||| = tr(A 2 - 2AB + B 2 ) < tr(A 2 + B 2 ) < 2. 
For an upper bound of (|24bl i, 

||Ai2 — Bi 2 || F — ||Ai2 — Bi 2 || F 
= tr ( (I12 - B12 + A 12 - B 12 ){A 12 - B 12 - A l2 + B 12 ] 



— ; H^-ii — B 22 || F — ||Ai2 — Bi 2 || F -^ 1| An — B 22 — A\ 2 + Bi 2 || F 

mn+mi2 V mn+rai2 

^ m n n , „ ||2 

S ; -An — -c>22 F 

"111 + "112 

mii 
<2- 



mn +mi2 



where the second equality follows from the definition of A\ 2 and Bi 2 while the last inequality is obtained by 
bounding the terms in the trace. Thus 

(12451) < 2Am i2 — < 2Amn. 

mn + mi2 

In a similar manner, (124c \ < 2Amn. Therefore, 

dH < 8Amn < d23) 



which implies that the cost incurred by rh is smaller than the cost incurred by m. 

For the case where mn > m 22 , we can prove the claim by constructing a new transportation plan m with values 

rh(xi,yi) = (mn - m 22 ) ■ Au Bn 

rh(xi,y 2 ) = (77112 + m 22 ) ■ A 12 <g> B X2 

rh(x 2 ,yi) = (m 2 i + m 22 ) • A 2 \ <S> B 21 

rh(x 2 ,y 2 ) = 
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with 

1 in 12 A 12 + 777,22-411 A 77712^12 + m 22 B 22 

^■12 = ; , -t>l2 = ; 

77712 + 77722 777i 2 + 777 2 2 

2 77721-421 + 77722^22 £ m 21 B 21 + m 2 2-Bll 

-^21 = ; , -t>2l = ; • 

77721 + 77722 777 2 l + m 2 2 

The rest of the proof is carried out in a similar manner. 
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